The advent of XML (eXtensible Markup Language) has provided a standards based mechanism for exchanging data between computer systems. XML, as the name implies, is extensible; that is the format in which the data is stored can be adapted to suit the data source. While this is one of the strengths of XML it also causes problems when importing data from one system into another in which the data formats do not match exactly. For example, consider this XML snippet describing a work of art in an imaginary Catalogue:
<table name="ecatalogue">
<tuple>
<atom column="TitMainTitle">An imaginary work of Art</atom>
<atom column="CreDateCreated">1995-07-02<atom>
<table column="CreCreatorRef_tab">
<tuple>
<atom column="NamLast">Citizen</atom>
<atom column="NamFirst">John</atom>
</tuple>
</table>
</tuple>
</table>
You receive this data from another institution using EMu and want to import it into your system, but there is a mismatch between some of the column names in your system and those in the originating institution. For example, in your Catalogue the Title column may be called SumTitle and the Date Created column may be called SumDateCreated. Before you can load the XML into your system it is necessary to transform it so that it appears like:
<table name="ecatalogue">
<tuple>
<atom column="SumTitle">An imaginary work of Art</atom>
<atom column="SumDateCreated">1995-07-02</atom>
<table column="CreCreatorRef_tab">
<tuple>
<atom column="NamLast">Citizen</atom>
<atom column="NamFirst">John</atom>
</tuple>
</table>
</tuple>
</table>
One way to make the change is to use a text editor and replace all instances of TitMainTitle with SumTitle and CreDateCreated with SumDateCreated. If the amount of data is small or if the import is to occur only once then this solution is feasible. If, however, a number of imports will occur in which the data will be supplied in the same format, it makes sense to use XSLT (e Xtensible Stylesheet Language Transforms) to apply the changes before the data is loaded. XSLT is an XML-based scripting language used to manipulate XML.
For example, the following script can be used to perform the required column renaming outlined above:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:map="urn:map" version="1.0">
<!-- Output in XML format -->
<xsl:output method="xml" encoding="utf-8"/>
<!-- Mapping table of old names to new names -->
<map:entries>
<map:entry oldname="TitMainTitle" newname="SumTitle"/>
<map:entry oldname="CreDateCreated" newname="SumDateCreated"/>
</map:entries>
<xsl:variable name="map" select="document('')/*/map:entries/*"/>
<!-- For every node we copy it over. Note that attributes
are handled by the next template. -->
<xsl:template match="*">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<!-- Special handling of attributes. -->
<xsl:template match="@*">
<xsl:variable name="entry" select="$map[@oldname = current()]"/>
<xsl:choose>
<xsl:when test="name() = 'column' and $entry">
<xsl:attribute name="column">
<xsl:value-of select="$entry/@newname"/>
</xsl:attribute>
</xsl:when>
<xsl:otherwise>
<xsl:copy/>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
</xsl:stylesheet>
To execute the XSLT script an XSL engine is required. A number of products provide XSL engines that can be used to transform the XML for loading into EMu. When a file is received from an institution, it is only necessary to perform the transformation before importing the XML into EMu.
To streamline this process, XSLT processing has been added as part of the Import tool for XML files: it is possible to import an XML file and have it transformed as part of the Import process. The XSLT file used to transform the XML can be stored on your local machine (local file) or on the EMu server (pre-configured file). Files stored on the EMu server are available to all users. In general, the pre-configured files are "standard" transformations used to manipulate data from known sources. A known source can be:
- a standard format (e.g. Darwin Core or Dublin Core)
- a repeatable format (e.g. EMu export format, BRAHMS)
Using repeatable formats it is possible to define XSLT files that allow for easy import of data from other EMu clients for customised modules such as the Catalogue, Taxonomy and Collection Events.
[Close]